Context-dependent modeling and speaker normalization applied to reservoir-based phone recognition
نویسندگان
چکیده
Reservoir Computing (RC) has recently been introduced as an interesting alternative for acoustic modeling. For phone and continuous digit recognition, the reservoir approach obtained quite promising results. In this work, we further elaborate this concept by porting some well-known techniques used to enhance recognition rates of GMM-based models to Reservoir Computing. In particular, we introduce context-dependent (CD) triphone states to model co-articulation and pronunciation mismatches arising from an imperfect lexicon. We also propose to incorporate two speaker normalization methods in the feature space, namely mean & variance normalization and vocal tract length normalization. The impact of the investigated techniques is studied in the context of phone recognition on the TIMIT corpus. Our CD-RC-HMM hybrid yields a speaker-independent phone error rate (PER) of 22% and a speaker-dependent PER of 20.5%. By combining GMM and RC-based likelihoods at the state level, these scores can be reduced further.
منابع مشابه
Allophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملSpeaker normalization training for mixture stochastic trajectory model
In this paper we are interested in speaker and environment adaptation techniques for speaker independent (SI) continuous speech recognition. These techniques are used to reduce mismatch between training and the testing conditions, using a small amount of adaptation data. In addition to reducing this mismatch during the adaptation, we propose to reduce the variation due to speakers or environmen...
متن کاملA Comparison of Normalization and Training Approaches for ASR-Dependent Speaker Identification1
In this paper we discuss a speaker identification approach, called ASR-dependent speaker identification, that incorporates phonetic knowledge into the models for each speaker. This approach differs from traditional methods for performing textindependent speaker identification, such as global Gaussian mixture modeling, that typically ignore the phonetic content of the speech signal. We introduce...
متن کامل